121 research outputs found
GPU accelerated maximum cardinality matching algorithms for bipartite graphs
We design, implement, and evaluate GPU-based algorithms for the maximum
cardinality matching problem in bipartite graphs. Such algorithms have a
variety of applications in computer science, scientific computing,
bioinformatics, and other areas. To the best of our knowledge, ours is the
first study which focuses on GPU implementation of the maximum cardinality
matching algorithms. We compare the proposed algorithms with serial and
multicore implementations from the literature on a large set of real-life
problems where in majority of the cases one of our GPU-accelerated algorithms
is demonstrated to be faster than both the sequential and multicore
implementations.Comment: 14 pages, 5 figure
Combinatorial Problems in High-Performance Computing: Partitioning
This extended abstract presents a survey of combinatorial problems
encountered in scientific computations on today\u27s
high-performance architectures, with sophisticated memory
hierarchies, multiple levels of cache, and multiple processors
on chip as well as off-chip.
For parallelism, the most important problem is to partition
sparse matrices, graph, or hypergraphs into nearly equal-sized
parts while trying to reduce inter-processor communication.
Common approaches to such problems involve multilevel
methods based on coarsening and uncoarsening (hyper)graphs,
matching of similar vertices, searching for good separator sets
and good splittings, dynamical adjustment of load imbalance,
and two-dimensional matrix splitting methods
Parallelization of Mapping Algorithms for Next Generation Sequencing Applications
With the advent of next-generation high throughput sequencing
instruments, large volumes of short sequence data are generated at an
unprecedented rate. Processing and analyzing these massive data
requires overcoming several challenges. A particular challenge
addressed in this abstract is the mapping of short sequences (reads)
to a reference genome by allowing mismatches. This is a significantly
time consuming combinatorial problem in many applications including
whole-genome resequencing, targeted sequencing, transcriptome/small
RNA, DNA methylation and ChiP sequencing, and takes time on the order
of days using existing sequential techniques on large scale
datasets. In this work, we introduce six parallelization methods each
having different scalability characteristics to speedup short sequence
mapping. We also address an associated load balancing problem that
involves grouping nodes of a tree from different levels. This problem
arises due to a trade-off between computational cost and granularity
while partitioning the workload. We comparatively present the
proposed parallelization methods and give theoretical cost models for
each of them. Experimental results on real datasets demonstrate the
effectiveness of the methods and indicate that they are successful at
reducing the execution time from the order of days to under just a few
hours for large datasets.
To the best of our knowledge this is the first study on
parallelization of short sequence mapping problem
A Survey of Pipelined Workflow Scheduling: Models and Algorithms
International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
Finding dense substructures in a graph is a fundamental graph mining
operation, with applications in bioinformatics, social networks, and
visualization to name a few. Yet most standard formulations of this problem
(like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the
goal is rarely to find the "true optimum", but to identify many (if not all)
dense substructures, understand their distribution in the graph, and ideally
determine relationships among them. Current dense subgraph finding algorithms
usually optimize some objective, and only find a few such subgraphs without
providing any structural relations. We define the nucleus decomposition of a
graph, which represents the graph as a forest of nuclei. Each nucleus is a
subgraph where smaller cliques are present in many larger cliques. The forest
of nuclei is a hierarchy by containment, where the edge density increases as we
proceed towards leaf nuclei. Sibling nuclei can have limited intersections,
which enables discovering overlapping dense subgraphs. With the right
parameters, the nucleus decomposition generalizes the classic notions of
k-cores and k-truss decompositions. We give provably efficient algorithms for
nucleus decompositions, and empirically evaluate their behavior in a variety of
real graphs. The tree of nuclei consistently gives a global, hierarchical
snapshot of dense substructures, and outputs dense subgraphs of higher quality
than other state-of-the-art solutions. Our algorithm can process graphs with
tens of millions of edges in less than an hour
MICA: microRNA integration for active module discovery
A successful method to address disease-specific module discovery is the integration of the gene expression data with the protein-protein interaction~(PPI) network. Although many algorithms have been developed for this purpose, they focus only on the network genes~(mostly on the well-connected ones); totally neglecting the genes whose interactions are partially or totally not known. In addition, they only make use of the gene expression data which does not give the complete picture about the actual protein expression levels. The cell uses different mechanisms, such as microRNAs, to post-transcriptionally regulate the proteins without affecting the corresponding genes' expressions. Due to this complexity, using a single data type is definitely not the correct way to find the correct module(s). Today, the unprecedented amount of publicly available disease-related heterogeneous data encourages the development of new methodologies to better understand complex diseases.
In this work, we propose a novel workflow Mica, which, to the best of our knowledge, is the first study integrating miRNA, mRNA, and PPI information to identify disease-specific gene modules. The novelty of the Mica lies in many directions, such as the early modification of mRNA expression with microRNA to better highlight the indirect dependencies between the genes. We applied Mica on microRNA-Seq and mRNA-Seq data sets of invasive ductal carcinoma samples and invasive lobular carcinoma samples from the Cancer Genome Atlas Project~(TCGA). The Mica modules are shown to unravel new and interesting dependencies between the genes. Additionally, the modules accurately differentiate between the case and control samples while being highly enriched with disease-specific pathways and genes
- …